首页> 外文OA文献 >Minimax Optimal Algorithms for Unconstrained Linear Optimization
【2h】

Minimax Optimal Algorithms for Unconstrained Linear Optimization

机译:无约束线性优化的minimax最优算法

摘要

We design and analyze minimax-optimal algorithms for online linearoptimization games where the player's choice is unconstrained. The playerstrives to minimize regret, the difference between his loss and the loss of apost-hoc benchmark strategy. The standard benchmark is the loss of the beststrategy chosen from a bounded comparator set. When the the comparison set andthe adversary's gradients satisfy L_infinity bounds, we give the value of thegame in closed form and prove it approaches sqrt(2T/pi) as T -> infinity. Interesting algorithms result when we consider soft constraints on thecomparator, rather than restricting it to a bounded set. As a warmup, weanalyze the game with a quadratic penalty. The value of this game is exactlyT/2, and this value is achieved by perhaps the simplest online algorithm ofall: unprojected gradient descent with a constant learning rate. We then derive a minimax-optimal algorithm for a much softer penaltyfunction. This algorithm achieves good bounds under the standard notion ofregret for any comparator point, without needing to specify the comparator setin advance. The value of this game converges to sqrt{e} as T ->infinity; wegive a closed-form for the exact value as a function of T. The resultingalgorithm is natural in unconstrained investment or betting scenarios, since itguarantees at worst constant loss, while allowing for exponential rewardagainst an "easy" adversary.
机译:我们设计和分析了在线线性优化游戏的最小最大最优算法,其中玩家的选择不受限制。玩家努力使后悔,他的损失与失去先验基准策略之间的差异最小化。标准基准是从有限制的比较器集中选择的最佳策略的损失。当比较集和对手的梯度满足L_infinity边界时,我们以封闭形式给出游戏的值,并证明其接近于sqrt(2T / pi)为T-> infinity。当我们考虑比较器的软约束,而不是将其限制为有界集合时,就会产生有趣的算法。作为热身,我们对游戏进行二次罚分分析。该游戏的值正好是T / 2,并且可以通过以下所有最简单的在线算法来获得该值:具有恒定学习率的非投影梯度下降。然后,我们推导出用于更弱惩罚函数的minimax-optimum算法。对于任何比较器点,该算法都在后悔的标准概念下达到了良好的界限,而无需事先指定比较器集。该游戏的价值收敛为sqrt {e},为T-> infinity;得出精确值作为T的函数的闭式形式。在不受限制的投资或投注情况下,结果算法很自然,因为它保证了最坏的持续损失,同时允许对“轻松”的对手进行指数奖励。

著录项

  • 作者

    McMahan, H. Brendan;

  • 作者单位
  • 年度 2013
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号